-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10026] [ML] [PySpark] Implement some common Params for regression in PySpark #8508
Conversation
Test build #41744 has finished for PR 8508 at commit
|
Test build #41745 has finished for PR 8508 at commit
|
" to adjust the probability of predicting each class." + | ||
" Array must have length equal to the number of classes, with values >= 0." + | ||
" The class with largest value p/t is predicted, where p is the original" + | ||
" probability of that class and t is the class' threshold.") | ||
threshold = Param(Params._dummy(), "threshold", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should also extract a HasThreshold
mixin for binary classifier thresholds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
threshold
is a deprecated parameter, it is replaced by thresholds
. LogisticRegression still reserve threshold
is just for binary compatibility. So I think we don't need to extract HasThreshold
as shared Param. @jkbradley
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the HasThresholds
trait mixin in ml.LogisticRegression
is actually an artifact resulting from transient dependency through ProbabilisticClassifier
. We don't actually support multi-class classification in ml.LogisticRegression
ATM and did quite a bit of work to make the API less confusing.
After mutli-class is supported I think it makes sense to use HasThresholds
, but for the time being I would prefer we only use HasThreshold
in the Python API.
LGTM 👍, some minor formatting comments and a suggestion. |
""" | ||
setParams(self, featuresCol="features", labelCol="label", predictionCol="prediction", \ | ||
maxIter=100, regParam=0.1, elasticNetParam=0.0, tol=1e-6, fitIntercept=True, \ | ||
threshold=0.5, thresholds=None, \ | ||
probabilityCol="probability", rawPredictionCol="rawPrediction") | ||
threshold=0.5, thresholds=None, probabilityCol="probability", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add \
Test build #42319 has finished for PR 8508 at commit
|
Merged into master. Thanks! |
LinearRegression and LogisticRegression lack of some Params for Python, and some Params are not shared classes which lead we need to write them for each class. These kinds of Params are list here:
Here we implement them in shared params at Python side and make LinearRegression/LogisticRegression parameters peer with Scala one.